Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove default from some executors #6

Closed

Conversation

mustafasrepo
Copy link
Collaborator

@mustafasrepo mustafasrepo commented Feb 22, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

In great analysis by [MENTION NAME] at the issue 9084. [MENTION NAME] recognized that stack usage (depth) increases a lot during logical and physical planning. The root cause of aggressive stack usage in the logical planning is excessive use of .clone of LogicalPlan enum.

In physical planning the problem stems from recursive function calls in the getter APIs of the Arc<dyn ExecutionPlan>, such as EquivalenceProperties, output_partitioning, output_ordering, etc.

In the PR9084, [MENTION NAME] could reduce physical plan stack usage by caching equivalence_properties for ProjectionExec.

This PR introduces a new struct to cache PlanProperties (PlanPropertiesCache). With this struct, schema, output_partitioning, equivalence_properties, output_ordering is cached. This caching mechanism removes recursive calls during getter methods. Also, given .cache method is implemented, default implementations of the .output_partitioning, .equivalence_properties, output_ordering works out of the box.

With these changes flame graph for the query 54 is converted from following graph
flamegraph_main_q54

to following graph

flamegraph_branch_q54 where stack usage decreases.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

api change

@github-actions github-actions bot added the core label Feb 22, 2024
@mustafasrepo mustafasrepo added the final_review Ozan can review this pr label Feb 22, 2024
mustafasrepo and others added 14 commits February 22, 2024 18:05
* WIP lag/lead ignore nulls

* Support IGNORE NULLS for LAG function

* fmt

* comments

* remove comments

* Add new tests, minor changes, trigger evalaute_all

* Make algorithm pruning friendly

---------

Co-authored-by: Mustafa Akur <[email protected]>
…ery plan (apache#9295)

* fix: issue apache#9213 substitute ArrayAgg to NthValue to optimize query plan

* fix format

* adding type check

* adding test
* chore: statically link xz2

* toml fmt
* fix: issue apache#9327 throw error when incursion happen in dataframe api

* fix
* add support for CopyTo::partition_by in proto

Signed-off-by: Hoang Pham <[email protected]>

* simplify partition_by logic

Signed-off-by: Hoang Pham <[email protected]>

---------

Signed-off-by: Hoang Pham <[email protected]>
* support udf in substrait

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
@mustafasrepo mustafasrepo force-pushed the feature/remove_default_cache branch from 4707939 to ace9815 Compare February 26, 2024 12:11
Weijun-H and others added 9 commits February 26, 2024 21:10
* support FixedSizeList Type Coercion

* add allow null type coercion parameter

* support null column in FixedSizeList

* Add test

* Add tests for cardinality with fixed size lists

* chore

* fix ci

* add comment

* Fix array_element function signature

* Remove unused imports and simplify code

* Fix array function signatures and behavior

* fix conflict

* fix conflict

* add tests for FixedSizeList

* remove unreacheable null check

* simplify the code

* remove null checking

* reformat output

* simplify code

* add tests for array_dims

* Refactor type coercion functions in datafusion/expr module
…ache#9342)

* feat: expand `unnest`  to accept any single array expression

* unnest null

* review feedback
* fix: downgrade tonic for arrow compatibility

Tonic 0.10 and 0.11 are not API compatible.
Arrow 50 depends on tonic 0.10, and datafusion must match that dependency for compatibility reasons.

* feat: make nested examples runnable

cargo run --example doesn't support nested examples. Nested examples need an explicit block to be runnable.

* fix: fix custom catalog typo and formatting

* docs: add note about upgrading tonic with arrow

* ci: add cargo check for all examples
…pache#9310)

* docs: update parquet_sql_multiple_files.rs with a relative path ex

* style: run cargo fmt

* docs: update comment

* docs: better
* tests: adds tests associated with apache#9237

* style: clippy
* feature: support nvl(ifnull) function

* add sqllogictest

* add docs entry

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: Jonah Gao <[email protected]>

* fix some code

* fix docs

---------

Co-authored-by: Jonah Gao <[email protected]>
* feat: move abs to datafusion_functions

* fix proto

* fix proto

* fix CI vendored code

* Fix proto

* add support type

* fix signature

* fix typo

* fix test cases

* disable a test case

* remove old code from math_expressions

* feat: add test

* fix clippy

* use unknown for proto

* fix unknown proto
mustafasrepo and others added 18 commits February 27, 2024 19:09
* Initial commit

* Updated mod.rs - Docstrings, Initial test

* Updated mod.rs - Fixed udf test

* Added udaf test, Updated udf test

* Added test for udwf

* Linting with rustfmt

* Update datafusion/core/src/execution/context/mod.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Moved tests to core/tests/user_defined

* fix fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
…* (1/3 regexpmatch) (apache#9329)

* feat: issue_9285: port builtin reg function into datafusion-function-* crate (1/3: RegexpMatch part)

* fix fmt

* refact

* modify test

* fix msrv verify problem

* port test and delete useless lines
# Conflicts:
#	datafusion/core/src/physical_optimizer/join_selection.rs
#	datafusion/physical-plan/src/aggregates/mod.rs
#	datafusion/physical-plan/src/coalesce_partitions.rs
#	datafusion/physical-plan/src/joins/cross_join.rs
#	datafusion/physical-plan/src/lib.rs
#	datafusion/physical-plan/src/sorts/sort_preserving_merge.rs
#	datafusion/physical-plan/src/unnest.rs
#	datafusion/physical-plan/src/windows/window_agg_exec.rs
#	datafusion/physical-plan/src/work_table.rs
@ozankabak
Copy link
Collaborator

Merged upstream.

@ozankabak ozankabak closed this Feb 29, 2024
@mustafasrepo mustafasrepo deleted the feature/remove_default_cache branch March 27, 2024 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.